AITopics | global minima exist and sgd

Collaborating Authors

global minima exist and sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bad Global Minima Exist and SGD Can Reach Them

Neural Information Processing SystemsDec-24-2025, 02:42:57 GMT

Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for the bias of the training process towards low-complexity models and, thus, for implicit regularization. We take a careful look at this explanation in the context of image classification with common deep neural network architectures. We find that if we do not regularize \emph{explicitly}, then SGD can be easily made to converge to poorly-generalizing, high-complexity models: all it takes is to first train on a random labeling on the data, before switching to properly training with the correct labels. In contrast, we find that in the presence of explicit regularization, pretraining with random labels has no detrimental effect on SGD. We believe that our results give evidence that explicit regularization plays a far more important role in the success of overparameterized neural networks than what has been understood until now. Specifically, in suppressing complicated models that got lucky with the training data, regularization not only makes simple models that fit the data well the global optima, but it also clears the way to make them discoverable by local methods, such as SGD.

bad global minima exist, global minima exist and sgd, name change, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Review for NeurIPS paper: Bad Global Minima Exist and SGD Can Reach Them

Neural Information Processing SystemsJan-24-2025, 23:29:22 GMT

Weaknesses: - The paper claims to have shown for the first time that models that perfectly fit the training set can have different degrees of generalization depending on the initialization, ie. This has been previously shown also using a similar technique. See for example "Theoretical issues in deep networks" by Poggio et al. (in PNAS), which shows (among other things) that depending on the standard deviation of the distribution to initialize the weights the network converges to global minima with different test accuracy (see Fig.2). Also, "Classical Generalization Bounds Are Surprisingly Tight For Deep Networks" by Liao et al. (CBMM Memo) introduces the training "Random initialization Training with random labels Training with true labels" and even more: they show that depending on the amount of images with randomized labels the test accuracy after training with the true labels varies accordingly (see Section 2). Fig.2 and 3) and tables are hard to quickly extract conclusions (Table 1).

bad global minima, global minima, hyperparameter, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Bad Global Minima Exist and SGD Can Reach Them

Neural Information Processing SystemsOct-10-2024, 08:26:58 GMT

bad global minima exist, global minima exist and sgd, regularization, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback